# Team NAN RISC-V Processor

Naveen Nathan, Neo Vasudeva, Anchit Rao

### Overview & Advanced Features

There are 3 main improvements we made to the baseline RISC-V Processor...

- Perceptron Branch Prediction with 4-way Set Associative BTB
- Conversion from 8-Way Direct Mapped L1 Cache to 16-Way Direct Mapped L1 Cache
- 8-way Set Associative L2 Cache



# Perceptron Branch Prediction & 4-Way Set Associative BTB Cache

### Description

| Bescription                                                                                                             |        |        |                         |                          |
|-------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------------------|--------------------------|
| Global Branch History Length: 12                                                                                        |        | втв %  | Conditional<br>Branch % | Unconditiona<br>Branch % |
| Table Size: 32 Perceptrons                                                                                              | COMP 1 | 84.80% | 71.87%                  | 78.61%                   |
| Improvements                                                                                                            |        |        |                         |                          |
| - Heavy usage of BTB with 84% hit rate.                                                                                 | COMP 2 | 93.51% | 60.69%                  | 69.28%                   |
| <ul> <li>Prediction accuracy above 60% for conditional branching.</li> <li>Prediction accuracy above 69% for</li> </ul> | COMP 3 | 98.75% | 75.60%                  | 86.91%                   |



unconditional branching.

# Perceptron Branch Prediction w/o BTB

## Performance Decrease of 1.64%

when compared to the baseline implementation.

|                | Baseline  |           |           | Perceptron |           |           |
|----------------|-----------|-----------|-----------|------------|-----------|-----------|
|                | COMP 1    | COMP 2    | COMP 3    | COMP 1     | COMP 2    | COMP 3    |
| F_max          |           | 104.94    |           |            | 101.10    |           |
| Power<br>(mW)  |           | 460.74    |           |            | 541.04    |           |
| Time<br>(ns)   | 2,296,295 | 7,017,795 | 4,409,605 | 2,114,875  | 6,743,795 | 4,313,505 |
| Total<br>Score | :         | 3.27E-08  |           |            | 3.33E-08  |           |

# 8-Way Set Associative L2 Cache (Adv. Feature) & 16-way Direct Mapped L1 Cache (Improvement)

#### Description

Data Line Size: 256 bits

Tag Size: 24 bits

LRU: 7 bits

Valid / Dirty: 1 Bit

#### **Improvements**

- Inclusive L2 cache has decreased complexity compared with exclusive.
- Drastically reduced memory hits due to high hit percentage

|        | L2 Cache Hit % |
|--------|----------------|
| COMP 1 | 96.96%         |
| COMP 2 | 98.90%         |
| COMP 3 | 93.05%         |

# 8-Way Set Associative L2 Cache (Adv. Feature) & 16-way Direct Mapped L1 Cache (Improvement)

# Performance Improvement of **3209.86**%

when compared to our baseline implementation.

|                | Baseline  |           |           | L2 Cache + 16-Way L1 |           |           |
|----------------|-----------|-----------|-----------|----------------------|-----------|-----------|
|                | COMP 1    | COMP 2    | COMP 3    | COMP 1               | COMP 2    | COMP 3    |
| F_max          |           | 104.94    |           |                      | 102.28    |           |
| Power<br>(mW)  |           | 460.74    |           |                      | 686.15    |           |
| Time<br>(ns)   | 2,296,295 | 7,017,795 | 4,409,605 | 695,385              | 1,830,825 | 1,132,355 |
| Total<br>Score | :         | 3.27E-08  |           |                      | 9.89E-10  |           |

## Overall Performance Improvements

## Performance Improvement of 3563.73%

when compared to our baseline implementation.

|                | Baseline  |           |           | Final Processor |           |           |
|----------------|-----------|-----------|-----------|-----------------|-----------|-----------|
|                | COMP 1    | COMP 2    | COMP 3    | COMP 1          | COMP 2    | COMP 3    |
| F_max          |           | 104.94    |           |                 | 89.75     |           |
| Power<br>(mW)  |           | 460.74    |           |                 | 704.67    |           |
| Time<br>(ns)   | 2,296,295 | 7,017,795 | 4,409,605 | 659,405         | 1,606,455 | 1,074,455 |
| Total<br>Score | 3.27E-08  |           |           | 8.94E-10        |           |           |

# Potential Improvements for Design Choices

- Implement BRAM memory for faster accesses
- Implement tournament branch predictor for better branch prediction accuracies
- START EARLIER

# Thank you for a great semester!

**TEAM NAN:** Naveen, Anchit, Neo